perm filename 780.TIM[TIM,LSP] blob sn#640440 filedate 1982-02-08 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00006 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	∂07-May-81  2350	ROD  
C00011 00003	∂06-Apr-81  1153	RPG  	Times    
C00012 00004	∂29-Sep-81  1104	CSVAX.fateman at Berkeley 	Re:  Timing  
C00014 00005	∂05-Oct-81  1245	SL   
C00017 00006	∂28-Jan-82  1221	pratt@Shasta (SuNet) 	750 <--> 780 timings   
C00027 ENDMK
C⊗;
∂07-May-81  2350	ROD  
To:   TOB, RPG, BIS    
Date:  6 May 1981 2331-PDT
From: KASHTAN
Subject: VAX-11/750 <--> VAX-11/780 benchmarks
To: quam, witkin, hanson, jirak, wilcox, meyers, larson, kennard, sad,
    heathman at SRI-AI, ryland at SRI-AI, burback at SRI-AI, mcghie,
    sword

Here are the complete results of the 11/750 - 11/780 benchmarks.  Looks
like the 11/750 gets to memory faster (and is optimized w.r.t. getting
to memory faster) than the 11/780.  It loses VERY badly when it comes to
actually executing instructions, as the execution unit is very much slower
in the 750 than the 780.  This is particulary born out by the execution
benchmarks for the convolution program in various languages.  The languages
vary from BLISS (which keeps the whole world in registers) to LISP (which
keeps the whole world in memory).  Even though the 750 gets to memory faster,
it doesn't do you much good when it takes so long to process what you got
from memory (even a simple move).
The 750 does a good job of operand processing (especially given its relative
CPU speed) but this doesn't seem to help too much in actual program execution,
as on the 750 the execution time seems to be dominated by the instruction
execution time rather than the on the operand fetch time (as is the case on
the 780).
A note on the Richard Fateman's 750 benchmarks.  Seems that all they did was
run a Liszt (Franz Lisp compiler) compile on one of Bell Labs UNIX systems.
A compiled Franz Lisp program (as Liszt is) tends to be very heavy on CALLS
and on moving things around in memory (i.e. to and from the stack).  No
intermediate results are kept in registers at all.  What this does is skew
the results somewhat towards a faster looking 750 (since the 750 will benefit
from any benchmarks that are heavily involved in memory referencing).  What
he reported was that the 750 was indeed about 60% of the 780 in this case.
PLEASE NOTE that large IU and VLSI programs, while we might consider them
memory intensive, are really virtual memory intensive (i.e. have very large
working sets).  This is not the same as the above benchmark.  Most IU and
VLSI programs when compiled with good compilers will tend to do a small amount
of computation (even just an add or multiply) with each datum fetched from
memory.  You can expect the performance of the 750 relative to the 780 to
drop quite a bit from the above mentioned 60%.  It should become very much
like the following convolution benchmarks (a very good example of a virtual
memory intensive program that does a small amount of computation with each
datum fetched).  An interesting side note:  CARs and CDRs in compiled lisp
tend to come out as   "movl  x(r),dst"  (which executes at about 60% of 780
speed).
My feeling from playing with the two systems is that the 750 is best used
as an entry level system for those sites which need to acquire the smallest
possible VAX configuration (i.e. the lowest possible price).  An entry level
750 goes for about $90K while an entry level 780 system with approximately
the same configuration would go for about $140K.  Clearly there is a big
difference here (almost all of it in the price of the CPU).  As the systems
get larger the price advantage goes away (as the price will note be dominated
by the CPU price, which is the case in the smaller systems, but by memory /
peripheral prices).  Here you will save about $50K on a $250K system and get
less than 1/2 the machine.
I am somewhat confused by the divide instruction timings.  There are a couple
of possibilities here - 1) a stupidity in the 780 was fixed in the 750
			2) I muffed the 780 test (don't thing so, as I
					triple checked it)
			3) I muffed the 750 test.
I find it incredible that MULL is 4x as fast on the 780 while DIVL is a bit
slower on the 780.  I did not do any floating point tests, as there is no
floating point accelerator on the 750.
David

-------------------------------------------------------------------------------

VAX-11/750 vs VAX-11/780
------------------------

Simple 2D convolution program:

		11/750		11/750 (% of 11/780)		11/780
		------		--------------------		------

BLISS-32	5.45 sec		45%			2.5 sec

VMS PASCAL	12.9 sec		38%			4.9 sec

UNIX C		11.3 sec		44%			5.0 sec

UNIX F77	39.9 sec		29%			11.4 sec

Compiled
Franz Lisp	76.5 sec		53%			41.0 sec


Instruction timings:

movl r,r	1000nSec		40%			400nSec
movl x(PC),r	1760nSec		45%			800nSec
movl r,x(PC)	2300nSec		52%		       1300nSec
movl (r),r	1330nSec		60%		        800nSec

Addressing modes:

r		0nSec			--			0nSec
# (short)	0nSec			--			0nSec
# (long)	700nSec			57%			400nSec
(r)		330nSec		       120%			400nSec
(r)+		330nSec		       120%			400nSec
-(r)		330nSec		       120%			400nSec
@(r)+		900nSec		       111%		       1000nSec
x(r)		500nSec			80%			400nSec
@x(r)		1150nSec		86%		       1000nSec
[r]		1000nSec		60%			600nSec

Instructions:

MOVL		1000nSec		40%			400nSec
ADDL
SUBL
etc

MULL		8000nSec		25%			2000nSec
DIVL		8000nSec	       112%			9000nSec
CALLx/RET	20000nSec+1800nSec/register			15000nSec+
				       100%			2000nSec/Reg
JSB/RSB		6000nSec		50%			3000nSec
SOBGxx		2000nSec		50%			1000nSec
ACBL		5600nSec		71%			4000nSec
MOVC3		350nSec/byte	       107%			375nSec/byte

3 operand	+500nSec		40%			+200nSec
instructions
-------
                ---------------
-------
-------

∂06-Apr-81  1153	RPG  	Times    

	kl	780	750	fv1	s1	2080
peak	3	1	0.6	2.8	20	24
scalar

ave	1.9	0.8	0.5	2.0	?	9.5
logic

float	0.5	0.6	0.2	1.0	80	7 (claimed 12?)
(dbl 
 floating mult)

Peak scalar is instruction drain rate (no memory fetches), ave logic is
non-arithmetic with memory fetches. Floating is floating peak (drain rate),
in MIPs. KL (model a cpu), fv1 is super-vax (not announced).
			-rpg-
∂29-Sep-81  1104	CSVAX.fateman at Berkeley 	Re:  Timing  
Date: 28 Sep 1981 16:45:34-PDT
From: CSVAX.fateman at Berkeley
To: RPG@SU-AI
Subject: Re:  Timing

we ran a particular demo file for VAX macsyma  (available as
mit-mc:demo;begin demo)  on the 11/780 (low usage) and on
the 11/750 (single user) and found that the 11/780 did the job
in 73% of the reported CPU time of the 11/750.  This was not
done with equal amounts of memory or identical disks;  I believe
those factors would tend to favor the 780.  This is not
the DEC "SUVAX" of earlier times, which is considerably slower than
the 750.  I suspect that with the floating point accelerator, the
750 would be quite nice; especially as memory up to 8 megabytes
will be available with 64kram chips.

∂05-Oct-81  1245	SL   
A VAX system is requested to handle the capabilities of ACRONYM, the integrated
vision system.  ACRONYM now includes 2MB of system.  
We expect it to expand as we add database facilities and new code.
It currently accomodates
only small pictures, 256x256.  With large pictures, the address space will be
much larger.  Current usage of the group is approximately 72% of a VAX 11/780,
or 125% of a VAX 11/750.  Typical experiments require 10 minutes now on a KL/10,
equivalent to 30 minutes on a VAX 11/780.  Although it is planned to make ACRONYM
more efficient, it is essential that execution time be in the range of
5 minutes for adequate debugging.
Compute power is important, thus a floating point accelerator (3% of total cost)
and interleaved memory (second memory controller (7% of total cost)).
For multiple users of large LISP systems, large memory is essential
because both VMS and UNIX impose heavy penalties in paging.  
Fortunately memory is inexpensive (9.5% of total cost).
Current disk usage is estimated at 300 megabytes for reasonable functioning.
That includes inadequate storage for pictures.  
A second disk is essential for adequate system performance in paging to avoid
disk contention, and a second controller makes a noticeable difference in 
performance. The large disk costs only 10% more than a smaller disk and provides
room for growth and for storage of data base and images.
Pictures and archival storage will
use a tape drive along with storage at SAIL over the network.

∂28-Jan-82  1221	pratt@Shasta (SuNet) 	750 <--> 780 timings   
Date: 28 Jan 1982 11:57:46-PST
From: pratt at Shasta
To: equip, sun
Subject: 750 <--> 780 timings

I'd forgotten all about the following item until I stumbled across it just
now.  Of interest now that we're expecting 750's.

 7-May-81 10:41:55-PDT,6040;000000000001
Mail-from: ARPANET site SUMEX-AIM rcvd at 7-May-81 1041-PDT
Date:  7 May 1981 1029-PDT
From: Rindfleisch@SUMEX-AIM
Subject: FYI RE VAX 750/780 BENCHMARKS (KASHTAN)
To:   SUMEX STAFF:
cc:   ADMIN.GORIN@SU-SCORE, CSL.FB@SU-SCORE, PRATT@SUMEX-AIM,
cc:   CSL.LANTZ@SU-SCORE, CSL.BKR@SU-SCORE, MOGUL@SCORE,
cc:   NOWICKI@SCORE

Mail-from: ARPANET host SRI-AI rcvd at 7-May-81 0009-PDT
Date:  7 May 1981 0001-PDT
From: Wilcox at SRI-AI (Clark Wilcox)
Subject: [KASHTAN: VAX-11/750 <--> VAX-11/780 benchmarks]
To: rindfleisch at SUMEX-AIM
cc: Wilcox at SRI-AI

Thought you might be interested.
                ---------------
Date:  6 May 1981 2331-PDT
From: KASHTAN
Subject: VAX-11/750 <--> VAX-11/780 benchmarks
To: quam, witkin, hanson, jirak, wilcox, meyers, larson, kennard, sad,
    heathman at SRI-AI, ryland at SRI-AI, burback at SRI-AI, mcghie,
    sword

Here are the complete results of the 11/750 - 11/780 benchmarks.  Looks
like the 11/750 gets to memory faster (and is optimized w.r.t. getting
to memory faster) than the 11/780.  It loses VERY badly when it comes to
actually executing instructions, as the execution unit is very much slower
in the 750 than the 780.  This is particulary born out by the execution
benchmarks for the convolution program in various languages.  The languages
vary from BLISS (which keeps the whole world in registers) to LISP (which
keeps the whole world in memory).  Even though the 750 gets to memory faster,
it doesn't do you much good when it takes so long to process what you got
from memory (even a simple move).
The 750 does a good job of operand processing (especially given its relative
CPU speed) but this doesn't seem to help too much in actual program execution,
as on the 750 the execution time seems to be dominated by the instruction
execution time rather than the on the operand fetch time (as is the case on
the 780).
A note on the Richard Fateman's 750 benchmarks.  Seems that all they did was
run a Liszt (Franz Lisp compiler) compile on one of Bell Labs UNIX systems.
A compiled Franz Lisp program (as Liszt is) tends to be very heavy on CALLS
and on moving things around in memory (i.e. to and from the stack).  No
intermediate results are kept in registers at all.  What this does is skew
the results somewhat towards a faster looking 750 (since the 750 will benefit
from any benchmarks that are heavily involved in memory referencing).  What
he reported was that the 750 was indeed about 60% of the 780 in this case.
PLEASE NOTE that large IU and VLSI programs, while we might consider them
memory intensive, are really virtual memory intensive (i.e. have very large
working sets).  This is not the same as the above benchmark.  Most IU and
VLSI programs when compiled with good compilers will tend to do a small amount
of computation (even just an add or multiply) with each datum fetched from
memory.  You can expect the performance of the 750 relative to the 780 to
drop quite a bit from the above mentioned 60%.  It should become very much
like the following convolution benchmarks (a very good example of a virtual
memory intensive program that does a small amount of computation with each
datum fetched).  An interesting side note:  CARs and CDRs in compiled lisp
tend to come out as   "movl  x(r),dst"  (which executes at about 60% of 780
speed).
My feeling from playing with the two systems is that the 750 is best used
as an entry level system for those sites which need to acquire the smallest
possible VAX configuration (i.e. the lowest possible price).  An entry level
750 goes for about $90K while an entry level 780 system with approximately
the same configuration would go for about $140K.  Clearly there is a big
difference here (almost all of it in the price of the CPU).  As the systems
get larger the price advantage goes away (as the price will note be dominated
by the CPU price, which is the case in the smaller systems, but by memory /
peripheral prices).  Here you will save about $50K on a $250K system and get
less than 1/2 the machine.
I am somewhat confused by the divide instruction timings.  There are a couple
of possibilities here - 1) a stupidity in the 780 was fixed in the 750
			2) I muffed the 780 test (don't thing so, as I
					triple checked it)
			3) I muffed the 750 test.
I find it incredible that MULL is 4x as fast on the 780 while DIVL is a bit
slower on the 780.  I did not do any floating point tests, as there is no
floating point accelerator on the 750.
David

-------------------------------------------------------------------------------

VAX-11/750 vs VAX-11/780
------------------------

Simple 2D convolution program:

		11/750		11/750 (% of 11/780)		11/780
		------		--------------------		------

BLISS-32	5.45 sec		45%			2.5 sec

VMS PASCAL	12.9 sec		38%			4.9 sec

UNIX C		11.3 sec		44%			5.0 sec

UNIX F77	39.9 sec		29%			11.4 sec

Compiled
Franz Lisp	76.5 sec		53%			41.0 sec


Instruction timings:

movl r,r	1000nSec		40%			400nSec
movl x(PC),r	1760nSec		45%			800nSec
movl r,x(PC)	2300nSec		52%		       1300nSec
movl (r),r	1330nSec		60%		        800nSec

Addressing modes:

r		0nSec			--			0nSec
# (short)	0nSec			--			0nSec
# (long)	700nSec			57%			400nSec
(r)		330nSec		       120%			400nSec
(r)+		330nSec		       120%			400nSec
-(r)		330nSec		       120%			400nSec
@(r)+		900nSec		       111%		       1000nSec
x(r)		500nSec			80%			400nSec
@x(r)		1150nSec		86%		       1000nSec
[r]		1000nSec		60%			600nSec

Instructions:

MOVL		1000nSec		40%			400nSec
ADDL
SUBL
etc

MULL		8000nSec		25%			2000nSec
DIVL		8000nSec	       112%			9000nSec
CALLx/RET	20000nSec+1800nSec/register			15000nSec+
				       100%			2000nSec/Reg
JSB/RSB		6000nSec		50%			3000nSec
SOBGxx		2000nSec		50%			1000nSec
ACBL		5600nSec		71%			4000nSec
MOVC3		350nSec/byte	       107%			375nSec/byte

3 operand	+500nSec		40%			+200nSec
instructions
-------
                ---------------
-------
-------